0.1 Overview

In this companion guide, we illustrate scenarios in which difference in differences analysis can be applied. We start with the simplest of scenarios and slowly amp up the complexity. For each scenario we describe a target parameter to estimate and the parameter estimated by a two-way fixed effects (TWFE) model. We also apply the Goodman-Bacon decomposition to this parameter to determine if the TWFE estimate is partially composed of estimates that are “forbidden” (e.g., ones that compare a newly treated state to a previously treated state) and apply alternate estimation approaches.

The goal is to illustrate scenarios where the usual TWFE method of estimation provides suitable results and scenarios where this approach is biased. In the latter case, we show alternative methods to estimation to overcome the issue.

In the following examples, state policy changes are the effects of interest.

0.1.1 Load the R packages used in this analysis

First, load the packages for reading and plotting the data. The packages bacondecomp and did are specific to DID analyses.

# you will need to install these packages if you don't have them already
# install.packages(c("here", "readxl", "tidyverse", "patchwork", "magrittr", 
# "broom", "ggrepel",  "bacondecomp", "did", "staggered"))

library(here)
library(readxl)
library(tidyverse)
library(patchwork)
library(magrittr)
library(broom)
library(ggrepel)
library(bacondecomp)
library(did)
library(staggered)

0.2 Scenario 1 (2 states by 2 time points)

We first consider the simplest case in which there is one state that never adopts the policy (the never-treated group), and one state that does implement the policy (the group that undergoes treatment). The groups are observed during two time periods only.

Below, the data are read in from an excel spreadsheet. Feel free to view the data in the spreadsheet or use an R View()er window. The data contain four variables state, time, policy, and outcome.

  • state: ID for the grouping variable
  • time: Time index, begins at 1
  • policy: Indicator variable for exposure to the policy change
  • outcome: The outcome variable
s1 <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen1",
                col_types = c("text", "text", "text", "numeric"))
# str(s1)
# View(s1)

Note that I specified the column types for each variable. I read in the state, time, and policy variables as “text” so that R will consider them as categorical/factor variables in the analysis (or, using econ-speak using “fixed effects”).

0.2.1 Visualization of Scenario 1

The following can be read from the labelled plot:

  • Pre-treatment difference of 1
  • Post-treatment difference of 3
  • Causal effect of policy = 3-1 = 2 or equivalently,
  • State 1 difference of 5
  • State 2 difference of 3
  • Causal effect of policy = 5-3 = 2 under the following assumptions

Three standard assumptions are typical stated in DID papers and required for the DID estimate [in the simple case?] to estimate the causal effect of interest:

  1. Parallel trends, or counterfactual parallel trends as described by Ben-Michael: Had the treated group not been treated, the trends for the treated state would be the same as the trends for the never-treated state.
  2. No anticipation: The effect of the policy change begins after it is introduced; there is no anticipatory effect before the policy is introduced. Anticipatory effects can occur if individuals change their behavior in anticipation of a policy change.
  3. [NAME FOR THIS ONE:] The policy is the only thing that changes at the time of treatment that affects the outcome

If you’re familiar with causal inference, you will also know about the main assumptions required when estimating causal effects. Ben-Micheal et al. expound on these in their paper, and we expand on them here:

  1. Consistency: The policy change is the same across treated units. For example, if states introduced a texting ban while driving and these bans were associated with different penalties, then the policy change would not meet the consistency assumption because the different penalties may have varying effects on the outcomes of interest.
  2. No interference: Outcomes of individuals in one state do not affect outcomes of individuals in other states.
  3. Exchangeability: Had the treated states been untreated, they would have experienced the trend in outcomes experienced by the untreated group. This causal assumption is the same as the counterfactual parallel trends assumption [DANA DO YOU AGREE?].
  4. Positivity: Each state has a non-zero probability of being treated. [TO DISCUSS: See Ben-Micheal paper - they say that positivity is not a required assumption for standard DID]

WHERE TO PUT THIS ONE IF WE TALK ABOUT IT?

  1. Correct model specification: SOMETHING ABOUT LINEARITY? ALSO RELATED TO THE PARALLEL TRENDS ASSUMPTION

0.2.2 Two-way fixed effected (TWFE) regression model

Under the canonical TWFE design, we can estimate the policy effect by including a fixed effect (FE) for state, a FE for time, and an indicator for the policy change. The effect estimate is the regression coefficient on the policy variable.

s1_mod <- lm(outcome ~ state + time + policy, data = s1)
tidy(s1_mod)
## # A tibble: 4 × 5
##   term        estimate std.error statistic p.value
##   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
## 1 (Intercept)     9          NaN       NaN     NaN
## 2 state2          1.00       NaN       NaN     NaN
## 3 time2           3          NaN       NaN     NaN
## 4 policy1         2.00       NaN       NaN     NaN

The coefficient on the policy term is 2 showing that the regression model estimate equalled the causal effect.

Takeaway: When there are only two treatment groups and two time points, where one of the groups becomes treated, you can calculate the DID estimate by hand or using TWFE regression. Note that there are only 4 rows of data and no sampling variability so no measures of variation can be estimated in this setting.

0.3 Scenario 2A and 2B (3 states, homogeneous and heterogeneous treatment effects)

This scenario is similar to Scenario 1, except now there are two states that undergo treatment. In Scenario 2A, the magnitude of the treatment effect is the same in both states. In Scenario 2B, the magnitude of the treatment effect varies by state.

s2a <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen2a",  
                 col_types = c("text", "text", "text", "numeric"))

s2b <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen2b",
                  col_types = c("text", "text", "text", "numeric"))

s2a %<>% mutate(ever_trt = case_when(state %in% c("2", "3") ~ "treated",
                                     state == "1" ~ "never-treated"))

s2b %<>% mutate(ever_trt = case_when(state %in% c("2", "3") ~ "treated",
                                     state == "1" ~ "never-treated"))

0.3.1 Visualization of Scenario 2

The following can be read from the labelled plot for Scenario 2A:

  • Post-Pre Difference for never-treated state 1: 12-9=3
  • Post-Pre Difference for treated state 2: 15-10=5
  • Post-Pre Difference for treated state 3: 16-11=5
  • Causal effect of policy = 5-3 for both states for an average treatment effect on the treated (ATT) of 2

The following can be read from the labelled plot for Scenario 2B:

  • Post-Pre Difference for never-treated state 1: 12-9=3
  • Post-Pre Difference for treated state 2: 15-10=5
  • Post-Pre Difference for treated state 3: 17-11=6
  • Causal effect of policy = 2 for state 2 vs 1 and 3 for state 3 vs 1 for an ATT of 2.5

0.3.2 TWFE Regression model

s2a_mod <- lm(outcome ~ policy + state + time, data = s2a)
tidy(s2a_mod)
## Warning in summary.lm(x): essentially perfect fit: summary may be unreliable
## # A tibble: 5 × 5
##   term        estimate std.error statistic p.value
##   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
## 1 (Intercept)     9            0       Inf       0
## 2 policy1         2.00         0       Inf       0
## 3 state2          1.00         0       Inf       0
## 4 state3          2.00         0       Inf       0
## 5 time2           3            0       Inf       0

The coefficient estimate equals to 2 the ATT for Scenario 2a.

s2b_mod <- lm(outcome ~ policy + state + time, data = s2b)
tidy(s2b_mod)
## # A tibble: 5 × 5
##   term        estimate std.error statistic p.value
##   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
## 1 (Intercept)    9         0.500     18.0   0.0353
## 2 policy1        2.5       0.866      2.89  0.212 
## 3 state2         0.750     0.661      1.13  0.460 
## 4 state3         2.25      0.661      3.40  0.182 
## 5 time2          3         0.707      4.24  0.147

The coefficient estimate equals to 2.5 the ATT for Scenario 2b.

Takeaway: When there are multiple treatment groups and two time points, where >1 of the groups becomes treated, you can calculate the DID estimate by hand or using TWFE regression. If treatment effects are heterogeneous across states, then the estimated effect is the average ATT across the states

Things to discuss with Dana:

  • The state FEs are a bit off (FE for state 3 should be 2, for state 2 should be 1)
  • Interesting to think about differences in the p-value for scen 2a (very small) vs 2b (big)

0.4 Scenario 3A and 3B (Two states, dynamic treatment effect)

In scenario 3, the number of time periods is increased to incorporate 5 time points before and after a policy change. One never-treated and one treated group are considered. In Scenario 3A, the effect of treatment is constant once treatment is introduced. In Scenario 3B, the effect of treatment increases with time (i.e., it is dynamic).

Because there are multiple time points, when we read in the data we update the column type for the time variable to be numeric. This will help with plotting the data. Later, when we run the TWFE model, we will use factor(time) to model time using indicator variables (i.e., time fixed effects) rather than including time as a linear term.

s3a <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen3a",
                 col_types = c("text", "numeric", "text", "numeric", "text"))

s3b <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen3b",
                 col_types = c("text", "numeric", "text", "numeric", "text"))

In addition to state, time, policy, and outcome, the data contains another variable time_since_policy which equals 0 before treatment (and always equals 0 for the never-treated), and counts the time since treatment in the treated state.

## `summarise()` has grouped output by 'state'. You can override using the `.groups` argument.
## `summarise()` has grouped output by 'state'. You can override using the `.groups` argument.

0.4.1 Visualization of Scenario 3

The following can be read from the labelled plot for Scenario 3A:

  • Pre-treatment difference is 17-15=2
  • Post-treatment difference is 34-30=4
  • DID = 2

The following can be read from the labelled plot for Scenario 3B:

  • Pre-treatment difference is 17-15=2
  • Post-treatment difference is 38-30=8
  • DID = 6

For Scenario 3B the effect is dynamic, and rather than calculating an ATT that aggregates over post-time, it might be beneficial to calculate the effect separately by time since treatment. Here, the effect increases over time, suggesting that the policy change takes time to “kick-in”. In contrast, some policies may be associated with large initial effects that attenuate over time. Knowing if a policy introduction is associated with a changing effect over time is therefore important for understanding the real-world effects of the policy.

Thus, in this scenario we first estimate the average treatment effect using a TWFE model, and then we estimate the dynamic effect by extending the TWFE model to include a categorical indicator for time since the policy’s introduction.

0.4.2 TWFE Regression Model

Estimation of the dynamic effect in Scenario 3A:

s3a_mod <- lm(outcome ~ policy + state + factor(time), 
             data = s3a)
tidy(s3a_mod)
## Warning in summary.lm(x): essentially perfect fit: summary may be unreliable
## # A tibble: 12 × 5
##    term           estimate std.error statistic   p.value
##    <chr>             <dbl>     <dbl>     <dbl>     <dbl>
##  1 (Intercept)        9.00  4.39e-15   2.05e15 3.62e-120
##  2 policy1            2.00  5.07e-15   3.94e14 1.92e-114
##  3 state2             2.00  3.59e-15   5.57e14 1.20e-115
##  4 factor(time)2      3.00  5.67e-15   5.29e14 1.83e-115
##  5 factor(time)3      6.00  5.67e-15   1.06e15 7.16e-118
##  6 factor(time)4      9     5.67e-15   1.59e15 2.79e-119
##  7 factor(time)5     12     5.67e-15   2.12e15 2.80e-120
##  8 factor(time)6     15     6.21e-15   2.41e15 9.73e-121
##  9 factor(time)7     18     6.21e-15   2.90e15 2.26e-121
## 10 factor(time)8     21     6.21e-15   3.38e15 6.59e-122
## 11 factor(time)9     24     6.21e-15   3.86e15 2.26e-122
## 12 factor(time)10    27     6.21e-15   4.34e15 8.83e-123

The coefficient of the policy term is 2 showing that the regression model estimate equaled the causal effect.

Estimation of the dynamic effect in Scenario 3B:

s3b_mod <- lm(outcome ~ policy + state + factor(time), 
             data = s3b)
tidy(s3b_mod)
## # A tibble: 12 × 5
##    term           estimate std.error statistic     p.value
##    <chr>             <dbl>     <dbl>     <dbl>       <dbl>
##  1 (Intercept)        9.00      1.22      7.35 0.0000801  
##  2 policy1            6         1.41      4.24 0.00283    
##  3 state2             2.00      1.00      2.00 0.0805     
##  4 factor(time)2      3.00      1.58      1.90 0.0943     
##  5 factor(time)3      6.00      1.58      3.79 0.00528    
##  6 factor(time)4      9.00      1.58      5.69 0.000459   
##  7 factor(time)5     12         1.58      7.59 0.0000637  
##  8 factor(time)6     13.0       1.73      7.51 0.0000689  
##  9 factor(time)7     17         1.73      9.81 0.00000976 
## 10 factor(time)8     21         1.73     12.1  0.00000198 
## 11 factor(time)9     25.0       1.73     14.4  0.000000519
## 12 factor(time)10    29.0       1.73     16.7  0.000000164

The coefficient of the policy term is 6 showing that the regression model estimate equaled the causal effect.

0.4.3 TWFE Regression Model with time_since_policy specification

Rather than using a binary indicator for the policy change, it can be coded using the time_since_policy variable. If we include this as a factor variable in the model, then separate effects will be estimated for each time since treatment.

For Scenario 3A, using this variable should yield 2 for each time since treatment, since the treatment effect is constant over time.

s3a_mod2 <- lm(outcome ~ time_since_policy + state + factor(time), 
             data = s3a)
tidy(s3a_mod2)
## Warning in summary.lm(x): essentially perfect fit: summary may be unreliable
## # A tibble: 16 × 5
##    term               estimate std.error statistic  p.value
##    <chr>                 <dbl>     <dbl>     <dbl>    <dbl>
##  1 (Intercept)            9.00  7.57e-15   1.19e15 3.00e-60
##  2 time_since_policy1     2.00  1.51e-14   1.32e14 1.97e-56
##  3 time_since_policy2     2.00  1.51e-14   1.32e14 1.97e-56
##  4 time_since_policy3     2.00  1.51e-14   1.32e14 1.97e-56
##  5 time_since_policy4     2.00  1.51e-14   1.32e14 1.97e-56
##  6 time_since_policy5     2.00  1.51e-14   1.32e14 1.97e-56
##  7 state2                 2.00  6.18e-15   3.24e14 5.47e-58
##  8 factor(time)2          3.00  9.77e-15   3.07e14 6.76e-58
##  9 factor(time)3          6.00  9.77e-15   6.14e14 4.22e-59
## 10 factor(time)4          9     9.77e-15   9.21e14 8.34e-60
## 11 factor(time)5         12     9.77e-15   1.23e15 2.64e-60
## 12 factor(time)6         15.0   1.24e-14   1.21e15 2.77e-60
## 13 factor(time)7         18.0   1.24e-14   1.46e15 1.33e-60
## 14 factor(time)8         21.0   1.24e-14   1.70e15 7.20e-61
## 15 factor(time)9         24     1.24e-14   1.94e15 4.22e-61
## 16 factor(time)10        27.0   1.24e-14   2.18e15 2.64e-61

Indeed, this is what we see!

For Scenario 3B, the effect increases by 2 units for every unit of time since treatment is initiated so this should be reflected in the coefficient estimates.

s3b_mod2 <- lm(outcome ~ time_since_policy + state + factor(time), 
             data = s3b)
tidy(s3b_mod2)
## Warning in summary.lm(x): essentially perfect fit: summary may be unreliable
## # A tibble: 16 × 5
##    term               estimate std.error statistic  p.value
##    <chr>                 <dbl>     <dbl>     <dbl>    <dbl>
##  1 (Intercept)            9.00  1.05e-14   8.56e14 1.12e-59
##  2 time_since_policy1     2.00  2.10e-14   9.51e13 7.33e-56
##  3 time_since_policy2     4.00  2.10e-14   1.90e14 4.58e-57
##  4 time_since_policy3     6.00  2.10e-14   2.85e14 9.05e-58
##  5 time_since_policy4     8.00  2.10e-14   3.80e14 2.86e-58
##  6 time_since_policy5    10.0   2.10e-14   4.76e14 1.17e-58
##  7 state2                 2.00  8.58e-15   2.33e14 2.04e-57
##  8 factor(time)2          3.00  1.36e-14   2.21e14 2.51e-57
##  9 factor(time)3          6.00  1.36e-14   4.42e14 1.57e-58
## 10 factor(time)4          9.00  1.36e-14   6.63e14 3.10e-59
## 11 factor(time)5         12.0   1.36e-14   8.84e14 9.82e-60
## 12 factor(time)6         15.0   1.72e-14   8.74e14 1.03e-59
## 13 factor(time)7         18.0   1.72e-14   1.05e15 4.97e-60
## 14 factor(time)8         21.0   1.72e-14   1.22e15 2.68e-60
## 15 factor(time)9         24     1.72e-14   1.40e15 1.57e-60
## 16 factor(time)10        27.0   1.72e-14   1.57e15 9.81e-61

For this scenario, the effect estimate is 2 in the first time after treatment, then 4, and so on.

Note that we could have modeled time_since_policy linearly, however including time_since_policy as a factor variable allows the estimation of effects to be non-linear over time.

0.5 Scenario 4 (4 states, hetergenous treatment effects)

Scenario 4 extends Scenario 3 to the multiple group setting. State 1 is never-treated, while states 2-4 undergo treatment at time=6. The effects are not dynamic in time, but they are heterogeneous with some states having a larger treatment effect

s4 <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen4",
                col_types = c("text", "numeric", "text", "numeric", "text", "numeric"))
## `summarise()` has grouped output by 'state'. You can override using the `.groups` argument.

0.5.1 Visualization of Scenario 4

The following can be read from the labelled plot:

  • Pre-post difference for never-treated state 1: 30-15 = 15

  • Pre-post difference for treated state 2: 34-17 = 17

  • Pre-post difference for treated state 3: 35-19 = 16

  • Pre-post difference for treated state 4: 39-21 = 18

  • \(DID_{2 vs. 1} = 17-15 = 2\)

  • \(DID_{3 vs. 1} = 16-15 = 1\)

  • \(DID_{4 vs. 1} = 18-15 = 3\)

  • \(ATT = 2\)

0.5.2 TWFE Regression Model

s4_mod <- lm(outcome ~ policy + state + factor(time), 
             data = s4)
tidy(s4_mod)
## # A tibble: 14 × 5
##    term           estimate std.error statistic  p.value
##    <chr>             <dbl>     <dbl>     <dbl>    <dbl>
##  1 (Intercept)        9.00     0.277     32.4  1.45e-22
##  2 policy1            2.00     0.320      6.24 1.31e- 6
##  3 state2             2.00     0.253      7.90 2.24e- 8
##  4 state3             3.50     0.253     13.8  1.71e-13
##  5 state4             6.50     0.253     25.7  5.34e-20
##  6 factor(time)2      3.00     0.310      9.67 4.20e-10
##  7 factor(time)3      6.00     0.310     19.3  5.83e-17
##  8 factor(time)4      9        0.310     29.0  2.44e-21
##  9 factor(time)5     12        0.310     38.7  1.63e-24
## 10 factor(time)6     15.0      0.392     38.2  2.20e-24
## 11 factor(time)7     18.0      0.392     45.9  2.06e-26
## 12 factor(time)8     21.0      0.392     53.5  3.89e-28
## 13 factor(time)9     24.0      0.392     61.2  1.24e-29
## 14 factor(time)10    27.0      0.392     68.8  5.91e-31

The coefficient of the policy term is 2 showing that the regression model estimates the ATT.

One of the issues with TWFE estimation is that contrasts between earlier-treated and later-treated states contribute to the coefficient estimate of the policy term. While all states are treated at the same time in Scenario 4, we can use the Goodman-Bacon decomposition function bacon() to partition the effect estimate into weighted component parts.

Before using the bacon() function, we need to modify some of the variables that are stored as characters by R to be stored as numeric:

s4 %<>% mutate(state_n = as.numeric(as.character(state)),
               policy_n = as.numeric(as.character(policy)))

The bacon function has an argument called formula, where you list the outcome as a function of the policy change, i.e., formula = outcome ~ policy_n. Note that no fixed effects are included in the formula. The time and state fixed effects are specified by the time_varand id_var arguments.

s4_bacon <- bacon(formula = outcome ~ policy_n, 
                  data = s4,
                  id_var = "state_n",
                  time_var = "time")
##                   type weight avg_est
## 1 Treated vs Untreated      1       2

This small table illustrates that 100% of the weight is put on comparisons between treated and untreated states and the ATT equals 2.

For more detail we can print the object:

s4_bacon 
##   treated untreated estimate weight                 type
## 2       6     99999        2      1 Treated vs Untreated

treated=6 says that all treated states start treatment at time = 6 and are compared to untreated states (with a fictitious treatment time of 99999).

The results from the Goodman-Bacon Decomposition are reassuring and tell us we can trust the TWFE estimate. For pedagodgical purposes, we also show the estimate if we applied the Callaway Sant’Anna estimator using the att_gt function to estimate group-time ATEs.

Like bacon(), att_gt() requires all numeric variables. It also requires an argument called gname. gname expects a variable which encodes for each state the time of first policy change. For never-treated state 1, this variable equals 0 and for treated states 2-4 this variable equals 6.

s4_cs <- att_gt(yname = "outcome", 
             tname = "time", 
             idname = "state_n", 
             gname = "time_first_treat", 
             data = s4, 
             anticipation = 0)
## Warning in pre_process_did(yname = yname, tname = tname, idname = idname, : Be aware that there are some small groups in your dataset.
##   Check groups: 0,6.
## Warning in att_gt(yname = "outcome", tname = "time", idname = "state_n", : Not
## returning pre-test Wald statistic due to singular covariance matrix
m2_ag <- aggte(s4_cs, type="simple")
summary(m2_ag)
## 
## Call:
## aggte(MP = s4_cs, type = "simple")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Forthcoming at the Journal of Econometrics <https://arxiv.org/abs/1803.09015>, 2020. 
## 
## 
## Overall ATT:  
##  ATT Std. Error     [95%  Conf. Int.]  
##    2          0         2           2 *
## 
## 
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Never Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust

Here, we see the ATT is equal to as estimated by the TWFE model.

0.6 Scenario 5 (4 states, staggered treatment timing)

Scenario 5 introduces staggered timing. In this example, the treated states introduce the policy at two different time points. The treatment effects are non-dynamic and there are two never-treated states and two ever-treated states.

s5 <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen5",
                col_types = c("text", "numeric", "text", "numeric", "text", "numeric"))

0.6.1 Visualization of Scenario 5

Visualization of all comparisons made by TWFE regression

Contrast 1:

  • Pre-post difference for never-treated state 1 is: 43.5-13.5=30
  • Pre-post difference for never-treated state 2 is: 44.5-14.5=30
  • Implying the average pre-post difference equals 30 for the never-treated states
  • Pre-post difference for treated state 3 is 49.5-16.5 = 33
  • \(DID_{3 vs 1,2} = 33-30=3\)

Contrast 2:

  • Pre-post difference for never-treated state 1 is: 54-24=30
  • Pre-post difference for never-treated state 2 is: 55-25=30
  • Implying the average pre-post difference equals 30 for the never-treated states
  • Pre-post difference for treated state 4 is 66-33= 33
  • \(DID_{4 vs 1,2} = 33-30=3\)

Contrast 3:

  • Pre-post difference for later-treated state 4 is: 39-22.5=16.5
  • Pre-post difference for earlier-treated state 2 is: 36-16.5=19.5
  • \(DID_{3 vs 4} = 19.5-16.5=3\)

Contrast 4:

  • Pre-post difference for earlier-treated state 2 is: 60-36=24
  • Pre-post difference for later-treated state 4 is: 66-39=27
  • \(DID_{4 vs 3} = 27-24=3\)
s5_mod <- lm(outcome ~  policy + state + factor(time), data = s5)
tidy(s5_mod)
## Warning in summary.lm(x): essentially perfect fit: summary may be unreliable
## # A tibble: 24 × 5
##    term          estimate std.error statistic p.value
##    <chr>            <dbl>     <dbl>     <dbl>   <dbl>
##  1 (Intercept)       9.00  1.65e-14   5.46e14       0
##  2 policy1           3.00  1.31e-14   2.30e14       0
##  3 state2            1.00  9.43e-15   1.06e14       0
##  4 state3            3.00  1.41e-14   2.13e14       0
##  5 state4            9.00  1.11e-14   8.10e14       0
##  6 factor(time)2     3.00  2.11e-14   1.42e14       0
##  7 factor(time)3     6.00  2.11e-14   2.85e14       0
##  8 factor(time)4     9.00  2.11e-14   4.27e14       0
##  9 factor(time)5    12     2.13e-14   5.63e14       0
## 10 factor(time)6    15.0   2.13e-14   7.03e14       0
## # … with 14 more rows

The coefficient of the policy term is 3 showing that the regression model estimates the ATT.

Using the bacon()function, we can see see the four comparisons being made and the effect estimates for each comparison. The weight column in the output shows how much weight each estimate contributes to the ATT. Here, the treated vs untreated comparisons are the most heavily weighted.

s5 %<>% mutate(state_n = as.numeric(as.character(state)),
               policy_n = as.numeric(as.character(policy)))
#bacon decomp says no problems
#only comparing treated vs untreated and the average effect estimate is 2
s5_bacon <- bacon(outcome ~ policy_n,
      data = s5,
      id_var = "state_n",
      time_var = "time")
##                       type  weight avg_est
## 1 Earlier vs Later Treated 0.06715       3
## 2 Later vs Earlier Treated 0.15108       3
## 3     Treated vs Untreated 0.78177       3

We can also see a list of each comparison and it’s weight:

s5_bacon
##   treated untreated estimate     weight                     type
## 2       5     99999        3 0.30695444     Treated vs Untreated
## 3      12     99999        3 0.47482014     Treated vs Untreated
## 6      12         5        3 0.15107914 Later vs Earlier Treated
## 8       5        12        3 0.06714628 Earlier vs Later Treated

For pedagogical purposes, we also show the estimate if we applied the Callaway Sant’Anna estimator using the att_gt() function to estimate group-time ATTs.

s5_cs <- att_gt(yname = "outcome", 
             tname = "time", 
             idname = "state_n", 
             gname = "time_first_treat", 
             data = s5, 
             anticipation = 0)
## Warning in pre_process_did(yname = yname, tname = tname, idname = idname, : Be aware that there are some small groups in your dataset.
##   Check groups: 0,5,12.
## Warning in att_gt(yname = "outcome", tname = "time", idname = "state_n", : Not
## returning pre-test Wald statistic due to singular covariance matrix
s5_cs_ag <- aggte(s5_cs, type="simple")
summary(s5_cs_ag)
## 
## Call:
## aggte(MP = s5_cs, type = "simple")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Forthcoming at the Journal of Econometrics <https://arxiv.org/abs/1803.09015>, 2020. 
## 
## 
## Overall ATT:  
##  ATT Std. Error     [95%  Conf. Int.]  
##    3          0         3           3 *
## 
## 
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Never Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust

Here, we see the ATT is equal to 3 as estimated by the TWFE model.

0.7 Scenario 6 (4 states, staggered timing, dynamic effects)

Scenario 6 is similar to Scenario 5 except now the treatment effect increases as time goes on, i.e., it is dynamic. The magnitude of the treatment effect is the same for both treated groups.

s6 <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen6", 
                col_types = c("text", "numeric", "text", "numeric", "text", "numeric"))

Visualization of all comparisons made by TWFE regression

Contrast 1:

  • Pre-post difference for never-treated state 1 is: 43.5-13.5=30
  • Pre-post difference for never-treated state 2 is: 44.5-14.5=30
  • Implying the average pre-post difference equals 30 for the never-treated states
  • Pre-post difference for treated state 3 is 57-16.5 = 40.5
  • \(DID_{3 vs 1,2} = 40.5-30 = 10.5\)

Contrast 2:

  • Pre-post difference for never-treated state 1 is: 54-24=30
  • Pre-post difference for never-treated state 2 is: 55-25=30
  • Implying the average pre-post difference equals 30 for the never-treated states
  • Pre-post difference for treated state 4 is 70-33= 37
  • \(DID_{4 vs 1,2} = 37-30 = 7\)

Contrast 3:

  • Pre-post difference for later-treated state 4 is: 39-22.5=16.5
  • Pre-post difference for earlier-treated state 3 is: 39-16.5=22.5
  • \(DID_{3 vs 4} = 22.5-16.5 = 6\)

Contrast 4:

  • Pre-post difference for earlier-treated state 3 is: 71-39=32
  • Pre-post difference for later-treated state 4 is: 70-39=31
  • \(DID_{4 vs 3} = 31-32 = -1\)

Contrasts 1 and 2 are okay because they are clean comparisons between never-treated and treated states. Contrasts 3 and 4 are trickier. Contrast 3 compares the earlier treated state to the later treated state. It is okay because parallel trends holds and it only uses the time before the later state is treated to inform the estimation. Contrast 4 is the problem – it uses post-treatment time from state 3 as the “pre” time for the comparison. The issue is that this post time includes the effects of the treatment, and since the effect is dynamic it does not satisfy the parallel trends assumption. However this contrast still contributes to the TWFE regression estimate!

Let’s take a look…

s6_mod <- lm(outcome ~ policy + state + factor(time), 
             data = s6)
tidy(s6_mod)
## # A tibble: 24 × 5
##    term          estimate std.error statistic  p.value
##    <chr>            <dbl>     <dbl>     <dbl>    <dbl>
##  1 (Intercept)       8.24     1.28       6.44 2.90e- 8
##  2 policy1           6.80     1.01       6.72 1.02e- 8
##  3 state2            1.00     0.731      1.37 1.77e- 1
##  4 state3            5.96     1.09       5.46 1.11e- 6
##  5 state4            9.09     0.861     10.6  6.23e-15
##  6 factor(time)2     3.00     1.63       1.84 7.17e- 2
##  7 factor(time)3     6.00     1.63       3.67 5.41e- 4
##  8 factor(time)4     9        1.63       5.51 9.51e- 7
##  9 factor(time)5    11.1      1.65       6.68 1.16e- 8
## 10 factor(time)6    14.3      1.65       8.65 6.76e-12
## # … with 14 more rows

The coefficient of the policy term equals 6.8. We can now use the Goodman Bacon decomposition to see how much weight each of the contrasts contributes to the effect estimate:

s6 %<>% mutate(state_n = as.numeric(as.character(state)),
               policy_n = as.numeric(as.character(policy)))

s6_bacon <- bacon(outcome ~ policy_n,
      data = s6,
      id_var = "state_n",
      time_var = "time")
##                       type  weight  avg_est
## 1 Earlier vs Later Treated 0.06715  6.00000
## 2 Later vs Earlier Treated 0.15108 -1.00000
## 3     Treated vs Untreated 0.78177  8.37423

It’s more helpful to view the four contrasts separately:

s6_bacon
##   treated untreated estimate     weight                     type
## 2       5     99999     10.5 0.30695444     Treated vs Untreated
## 3      12     99999      7.0 0.47482014     Treated vs Untreated
## 6      12         5     -1.0 0.15107914 Later vs Earlier Treated
## 8       5        12      6.0 0.06714628 Earlier vs Later Treated

So we can see the estimates of the treatment effect for each contrast are as we calculated them earlier.

We can also confirm that you get the TWFE estimate by taking the weighted average of the Goodman-Bacon decomposition components:

#see the TWFE estimate
sum(s6_bacon$estimate*s6_bacon$weight)
## [1] 6.798561

Okay, so we know the TWFE regression estimate is wrong, in particular because it is incorporating a contrast that we wouldn’t want to make in practice between an earlier treated and a later treated group. To overcome this we can use the Callaway and Sant’Anna estimator.

Like in the earlier examples we start by running the att_gt() function. Unlike before, we display the output from this step using summary(). This output shows the estimated ATT for each combination of treated state and time. The time is long becuase it includes estimates for pre-policy time, which helps with the evaluation of the parallel trends assumption or to see if there are any lead effects (“anticipation”) of the treatment on the outcome. You wouldn’t usually report this entire table, but it worth showing here to see how the estimated effect is dynamic and that decisions need to be make about how to report dynamics effects including whether or not to aggregate the effect, and if so, at what level.

s6_cs <- att_gt(yname = "outcome", 
             tname = "time", 
             idname = "state_n", 
             gname = "time_first_treat", 
             data = s6, 
             anticipation = 0)
## Warning in pre_process_did(yname = yname, tname = tname, idname = idname, : Be aware that there are some small groups in your dataset.
##   Check groups: 0,5,12.
## Warning in att_gt(yname = "outcome", tname = "time", idname = "state_n", : Not
## returning pre-test Wald statistic due to singular covariance matrix
summary(s6_cs)
## 
## Call:
## att_gt(yname = "outcome", tname = "time", idname = "state_n", 
##     gname = "time_first_treat", data = s6, anticipation = 0)
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Forthcoming at the Journal of Econometrics <https://arxiv.org/abs/1803.09015>, 2020. 
## 
## Group-Time Average Treatment Effects:
##  Group Time ATT(g,t) Std. Error [95% Simult.  Conf. Band]   
##      5    2        0          0            NA          NA NA
##      5    3        0          0            NA          NA NA
##      5    4        0          0            NA          NA NA
##      5    5        3          0            NA          NA NA
##      5    6        4          0            NA          NA NA
##      5    7        5          0            NA          NA NA
##      5    8        6          0            NA          NA NA
##      5    9        7          0            NA          NA NA
##      5   10        8          0            NA          NA NA
##      5   11        9          0            NA          NA NA
##      5   12       10          0            NA          NA NA
##      5   13       11          0            NA          NA NA
##      5   14       12          0            NA          NA NA
##      5   15       13          0            NA          NA NA
##      5   16       14          0            NA          NA NA
##      5   17       15          0            NA          NA NA
##      5   18       16          0            NA          NA NA
##      5   19       17          0            NA          NA NA
##      5   20       18          0            NA          NA NA
##     12    2        0          0            NA          NA NA
##     12    3        0          0            NA          NA NA
##     12    4        0          0            NA          NA NA
##     12    5        0          0            NA          NA NA
##     12    6        0          0            NA          NA NA
##     12    7        0          0            NA          NA NA
##     12    8        0          0            NA          NA NA
##     12    9        0          0            NA          NA NA
##     12   10        0          0            NA          NA NA
##     12   11        0          0            NA          NA NA
##     12   12        3          0            NA          NA NA
##     12   13        4          0            NA          NA NA
##     12   14        5          0            NA          NA NA
##     12   15        6          0            NA          NA NA
##     12   16        7          0            NA          NA NA
##     12   17        8          0            NA          NA NA
##     12   18        9          0            NA          NA NA
##     12   19       10          0            NA          NA NA
##     12   20       11          0            NA          NA NA
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Never Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust

Callaway and Sant’Anna provide many options for aggregating the group-time effects. The simplest option is specify type = simple in the attge() function. This estimate considered only the clean contrasts (Comparisons 1 and 2 in the figure above) and combines them by weighting by the total time in the post-treatment period. For contrast 1, there are 16 time periods post treatment and for contrast 2, there are 9 periods post-treatment. Thus this estimate is the weighted average: \(10.5\times(16/25) + 7\times(9/25)=9.24\)

#aggregate the group time average treatment effects
#type = "simple": weighted average of all group-time average treatment effects
#with weights proportional to the group size
s6_cs_ag <- aggte(s6_cs, type="simple")
summary(s6_cs_ag)
## 
## Call:
## aggte(MP = s6_cs, type = "simple")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Forthcoming at the Journal of Econometrics <https://arxiv.org/abs/1803.09015>, 2020. 
## 
## 
## Overall ATT:  
##   ATT Std. Error     [95%  Conf. Int.]  
##  9.24     1.1956    6.8967     11.5833 *
## 
## 
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Never Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust

Alternatively, we can specify type = "group" to get separate effect estimates according to time of implementation. Note that group denotes the time of the policy change for the treated states.

s6_cs_ag2 <- aggte(s6_cs, type = "group")
summary(s6_cs_ag2)
## 
## Call:
## aggte(MP = s6_cs, type = "group")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Forthcoming at the Journal of Econometrics <https://arxiv.org/abs/1803.09015>, 2020. 
## 
## 
## Overall ATT:  
##   ATT Std. Error     [95%  Conf. Int.]  
##  8.75          0      8.75        8.75 *
## 
## 
## Group Effects:
##  group  ATT Std. Error [95% Simult.  Conf. Band]  
##      5 10.5          0          10.5        10.5 *
##     12  7.0          0           7.0         7.0 *
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Never Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust

Alternatively, you can estimate the dynamic effect separately for each time since the treatment has been introduced. Remember that this effect estimation is also done for pre-treatment time, so don’t be alarmed by all the rows in this table!

s6_cs_ag3 <- aggte(s6_cs, type="dynamic")
summary(s6_cs_ag3)
## 
## Call:
## aggte(MP = s6_cs, type = "dynamic")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Forthcoming at the Journal of Econometrics <https://arxiv.org/abs/1803.09015>, 2020. 
## 
## 
## Overall ATT:  
##   ATT Std. Error     [95%  Conf. Int.]  
##  10.5          0      10.5        10.5 *
## 
## 
## Dynamic Effects:
##  event time ATT Std. Error [95% Simult.  Conf. Band]   
##         -10   0          0            NA          NA NA
##          -9   0          0            NA          NA NA
##          -8   0          0            NA          NA NA
##          -7   0          0            NA          NA NA
##          -6   0          0            NA          NA NA
##          -5   0          0            NA          NA NA
##          -4   0          0            NA          NA NA
##          -3   0          0            NA          NA NA
##          -2   0          0            NA          NA NA
##          -1   0          0            NA          NA NA
##           0   3          0            NA          NA NA
##           1   4          0            NA          NA NA
##           2   5          0            NA          NA NA
##           3   6          0            NA          NA NA
##           4   7          0            NA          NA NA
##           5   8          0            NA          NA NA
##           6   9          0            NA          NA NA
##           7  10          0            NA          NA NA
##           8  11          0            NA          NA NA
##           9  12          0            NA          NA NA
##          10  13          0            NA          NA NA
##          11  14          0            NA          NA NA
##          12  15          0            NA          NA NA
##          13  16          0            NA          NA NA
##          14  17          0            NA          NA NA
##          15  18          0            NA          NA NA
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Never Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust

In this case, it is very helpful to plot the effect estimates over time:

ggdid(s6_cs_ag3)

Lastly, you can use type = "calendar" to aggregate the effects over calendar time. Epidemiologically, this doesn’t make a lot of sense with dynamic effects because it is mixing effect estimates across difference times since treatment was introduced. In some cases, only one comparison is contributing to the estimation (e.g., in calendar time periods where only one group has introduced the treatment), while at other points, two comparisons contribute. But this “knowledge” is lot in the presentation. We don’t show the output from using type = "calendar" but include the code below in case it makes sense for your setting.

#calendar time specific
#not recommended for this setting...but CS say they prefer it here
#https://bcallaway11.github.io/did/articles/did-basics.html
s6_cs_ag4 <- aggte(s6_cs, type = "calendar")
summary(s6_cs_ag4)
ggdid(s6_cs_ag4)

0.7.1 TWFE with time_since_policy specification

But what if we model TWFE with the time_since_change indicators?

s6_mod2 <- lm(outcome ~ time_since_policy + state + factor(time), 
             data = s6)
tidy(s6_mod2) %>% filter(str_detect(term, "time_since")) %>% 
  mutate(time = as.numeric(gsub("time_since_policy", "", term))) %>% 
  arrange(time) 
## Warning in summary.lm(x): essentially perfect fit: summary may be unreliable
## # A tibble: 16 × 6
##    term                estimate std.error statistic p.value  time
##    <chr>                  <dbl>     <dbl>     <dbl>   <dbl> <dbl>
##  1 time_since_policy1      3.00  2.72e-14   1.10e14       0     1
##  2 time_since_policy2      4.00  2.72e-14   1.47e14       0     2
##  3 time_since_policy3      5.00  2.77e-14   1.80e14       0     3
##  4 time_since_policy4      6.00  2.77e-14   2.16e14       0     4
##  5 time_since_policy5      7.00  2.77e-14   2.52e14       0     5
##  6 time_since_policy6      8.00  2.77e-14   2.88e14       0     6
##  7 time_since_policy7      9.00  2.77e-14   3.24e14       0     7
##  8 time_since_policy8     10.0   2.87e-14   3.48e14       0     8
##  9 time_since_policy9     11.0   2.87e-14   3.83e14       0     9
## 10 time_since_policy10    12.0   4.00e-14   3.00e14       0    10
## 11 time_since_policy11    13.0   4.00e-14   3.25e14       0    11
## 12 time_since_policy12    14.0   4.00e-14   3.50e14       0    12
## 13 time_since_policy13    15.0   4.00e-14   3.75e14       0    13
## 14 time_since_policy14    16.0   4.00e-14   4.00e14       0    14
## 15 time_since_policy15    17.0   4.03e-14   4.22e14       0    15
## 16 time_since_policy16    18.0   4.03e-14   4.47e14       0    16

In this setting, the regression model still works! The estimates from the policy indicator variables equal the output from the Callaway Sant’Anna when specified using type = "dynamic"

Takeway: When the treatment effect is staggered and dynamic, you can still capture the effect estimate using a TWFE model so long as the policy effect is modeled using the time since treatment variable. The key question for the researcher is identifying their parameter of interest – are you interested in estimating a dynamic effect, or does a parameter that summarizes over treatment time make sense? I am inclined to start by estimating a dynamic effect to see if the model supports the presence of a dynamic effect. If so, it is important of how the policy operates after being rolled out. If there is no strong support for a dynamic effect, then I would consider estimating the effects separately by timing of introduction if that is sensible for the policy change under study (i.e., if there are only a few separate time points), or estimating an overall summary parameter.

0.8 Scenario 7 (4 states, staggered timing , dynamic and hetergenous treatment effects)

Scenario 7 is like Scenario 6 except now the treatment effect is dynamic AND heterogeneous. In this scenario, the state that implements the policy change first has a stronger treatment effect (i.e., a steeper change in slope)

s7 <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen7", 
                col_types = c("text", "numeric", "text", "numeric", "text", "numeric"))

Visualization of all comparisons made by TWFE regression

Contrast 1:

  • Pre-post difference for never-treated state 1 is: 43.5-13.5=30
  • Pre-post difference for never-treated state 2 is: 44.5-14.5=30
  • Implying the average pre-post difference equals 30 for the never-treated states
  • Pre-post difference for treated state 3 is 64.5-16.5 = 48
  • \(DID_{3 vs 1,2} = 48-30 = 18\)

Contrast 2:

  • Pre-post difference for never-treated state 1 is: 54-24=30
  • Pre-post difference for never-treated state 2 is: 55-25=30
  • Implying the average pre-post difference equals 30 for the never-treated states
  • Pre-post difference for treated state 4 is 68-33= 35
  • \(DID_{4 vs 1,2} = 35-30 = 5\)

Contrast 3:

  • Pre-post difference for later-treated state 4 is: 39-22.5=16.5
  • Pre-post difference for earlier-treated state 3 is: 42-16.5=25.5
  • \(DID_{3 vs 4} = 25.5-16.5 = 9\)

Contrast 4:

  • Pre-post difference for earlier-treated state 3 is: 82-42=40
  • Pre-post difference for later-treated state 4 is: 68-39=29
  • \(DID_{4 vs 3} = 29-40 = -11\)

Similar to the previous scenario, parallel trends is satisfied for contrasts 1-3, so we can be okay with these contrasts contributed to the average effect estimate. Like last time, contrast 4 is the problem, and here the problem is intensified because the “control” state – which was treated in an earlier period – is undergoing a large treatment effect in the “pre” period (times 5-11). This makes the pre-post different in the control state larger than the pre-post difference in the comparison state – because the control state is still experiencing a stronger (i.e., steeper slope) treatment effect than the state that is treated at time 12. This makes the DID for this group and time look negative – like the treatment is associated with a decrease in the outcome!

Let’s take a look at the regression output:

s7_mod <- lm(outcome ~ policy + state + factor(time), 
             data = s7)
tidy(s7_mod)
## # A tibble: 24 × 5
##    term          estimate std.error statistic    p.value
##    <chr>            <dbl>     <dbl>     <dbl>      <dbl>
##  1 (Intercept)       6.98      2.76     2.53  0.0143    
##  2 policy1           6.84      2.18     3.13  0.00276   
##  3 state2            1.00      1.58     0.634 0.529     
##  4 state3           11.9       2.35     5.07  0.00000472
##  5 state4            8.17      1.86     4.40  0.0000496 
##  6 factor(time)2     3.00      3.53     0.851 0.399     
##  7 factor(time)3     6.00      3.53     1.70  0.0944    
##  8 factor(time)4     9.00      3.53     2.55  0.0135    
##  9 factor(time)5    11.0       3.57     3.09  0.00308   
## 10 factor(time)6    14.5       3.57     4.07  0.000147  
## # … with 14 more rows

The coefficient of the policy term equals 6.84.

The question is, does this estimate come close to what it should be? Well, above we calculated effect estimates for four separate contrasts, where the fourth contrast did not satisfy the parallel trends assumption, but the other three contrasts did. So if I were summarizing across these clean contrasts I would want to make a summary estimate across the DID estimates of 18, 5, and 9. One question is, how much weight do we put on each of the estimates? Another question is, do we consider the DID estimate of 9 comparing earlier to later treated states? So, it isn’t straightforward what the estimate should be when summarizing across several contrasts. But what we show below based on the Goodman Bacon decomposition is that the TWFE estimator is influenced by the forbidden fourth contrast that doesn’t meet the parallel trends assumption, so we’d like an estimator that doesn’t suffer form this concern.

We can now use the Goodman Bacon decomposition to see how much weight each of the contrasts contributes to the effect estimate:

s7 %<>% mutate(state_n = as.numeric(as.character(state)),
               policy_n = as.numeric(as.character(policy)))

s7_bacon <- bacon(outcome ~ policy_n,
      data = s7,
      id_var = "state_n",
      time_var = "time")
##                       type  weight   avg_est
## 1 Earlier vs Later Treated 0.06715   9.00000
## 2 Later vs Earlier Treated 0.15108 -11.00000
## 3     Treated vs Untreated 0.78177  10.10429
s7_bacon
##   treated untreated estimate     weight                     type
## 2       5     99999       18 0.30695444     Treated vs Untreated
## 3      12     99999        5 0.47482014     Treated vs Untreated
## 6      12         5      -11 0.15107914 Later vs Earlier Treated
## 8       5        12        9 0.06714628 Earlier vs Later Treated

Notice that our fourth contrast (here shown in row 3 of the table) contributes 15% of the weight to the estimate. We want an estimator that does not use this contrast!

Below we confirm the weighted estimate across the rows equals the TWFE estimate from the regression:

#see the TWFE estimate
sum(s7_bacon$estimate*s7_bacon$weight)
## [1] 6.841727

Now that we know how to use the Callaway and Sant’Anna code, we can apply it to this example as well. Let’s start by running the att_gt() function. This time we won’t display the output from summary(), but feel free to uncomment that line to take a peak:

s7_cs <- att_gt(yname = "outcome", 
             tname = "time", 
             idname = "state_n", 
             gname = "time_first_treat", 
             data = s7, 
             anticipation = 0)
## Warning in pre_process_did(yname = yname, tname = tname, idname = idname, : Be aware that there are some small groups in your dataset.
##   Check groups: 0,5,12.
## Warning in att_gt(yname = "outcome", tname = "time", idname = "state_n", : Not
## returning pre-test Wald statistic due to singular covariance matrix
#summary(s7_cs)

Let’s aggregate the ATTs using type = simple. Callaway and Sant’Anna’s estimate considers only the clean contrasts of 1 and 2 and combines them by weighting by the total time in the post-treatment period. For contrast 1, there are 16 time periods post treatment and for contrast 2, there are 9 periods post-treatment. Thus this estimate is the weighted average: \(18\times(16/25) + 5\times(9/25)=13.32\).

#aggregate the group time average treatment effects
#type = "simple": weighted average of all group-time average treatment effects
#with weights proportional to the group size
s7_cs_ag <- aggte(s7_cs, type="simple")
summary(s7_cs_ag)
## 
## Call:
## aggte(MP = s7_cs, type = "simple")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Forthcoming at the Journal of Econometrics <https://arxiv.org/abs/1803.09015>, 2020. 
## 
## 
## Overall ATT:  
##    ATT Std. Error     [95%  Conf. Int.]  
##  13.32     4.4407    4.6164     22.0236 *
## 
## 
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Never Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust

Alternatively, we can specify type = "group" to get separate effect estimates according to time of implementation and see that these match our by-hand calculations:

s7_cs_ag2 <- aggte(s7_cs, type = "group")
summary(s7_cs_ag2)
## 
## Call:
## aggte(MP = s7_cs, type = "group")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Forthcoming at the Journal of Econometrics <https://arxiv.org/abs/1803.09015>, 2020. 
## 
## 
## Overall ATT:  
##   ATT Std. Error     [95%  Conf. Int.]  
##  11.5          0      11.5        11.5 *
## 
## 
## Group Effects:
##  group ATT Std. Error [95% Simult.  Conf. Band]  
##      5  18          0            18          18 *
##     12   5          0             5           5 *
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Never Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust

Alternatively, you can estimate the dynamic effect separately for each time since the treatment has been introduced. Remember that this effect estimation is also done for pre-treatment time, so don’t be alarmed by all the rows in this table!

s7_cs_ag3 <- aggte(s7_cs, type="dynamic")
summary(s7_cs_ag3)
## 
## Call:
## aggte(MP = s7_cs, type = "dynamic")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Forthcoming at the Journal of Econometrics <https://arxiv.org/abs/1803.09015>, 2020. 
## 
## 
## Overall ATT:  
##      ATT Std. Error     [95%  Conf. Int.]  
##  16.3125     2.5019   11.4089     21.2161 *
## 
## 
## Dynamic Effects:
##  event time  ATT Std. Error [95% Simult.  Conf. Band]   
##         -10  0.0     0.0000            NA          NA NA
##          -9  0.0     0.0000            NA          NA NA
##          -8  0.0     0.0000            NA          NA NA
##          -7  0.0     0.0000            NA          NA NA
##          -6  0.0     0.0000            NA          NA NA
##          -5  0.0     0.0000            NA          NA NA
##          -4  0.0     0.0000            NA          NA NA
##          -3  0.0     0.0000            NA          NA NA
##          -2  0.0     0.0000            NA          NA NA
##          -1  0.0     0.0000            NA          NA NA
##           0  2.0     0.7413            NA          NA NA
##           1  3.5     1.1120            NA          NA NA
##           2  5.0     1.4826            NA          NA NA
##           3  6.5     1.8533            NA          NA NA
##           4  8.0     2.2239            NA          NA NA
##           5  9.5     2.5946            NA          NA NA
##           6 11.0     0.0000            NA          NA NA
##           7 12.5     3.3359            NA          NA NA
##           8 14.0     3.7065            NA          NA NA
##           9 21.0     0.0000            NA          NA NA
##          10 23.0     0.0000            NA          NA NA
##          11 25.0     0.0000            NA          NA NA
##          12 27.0     0.0000            NA          NA NA
##          13 29.0     0.0000            NA          NA NA
##          14 31.0     0.0000            NA          NA NA
##          15 33.0     0.0000            NA          NA NA
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Never Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust

In this case, it is very helpful to plot the effect estimates over time:

ggdid(s7_cs_ag3)

Here again we don’t display the results using type = "calendar" but have included the code here in case you want to run it.

# s7_cs_ag4 <- aggte(s7_cs, type = "calendar")
# summary(s7_cs_ag4)
# ggdid(s7_cs_ag4)

0.8.1 TWFE with time_since_policy specification

But what if model TWFE with the time_since_change indicators? Here I run the regression and filter() out the policy estimates since the regression table is so large:

s7_mod2 <- lm(outcome ~ time_since_policy + state + factor(time), 
             data = s7)
tidy(s7_mod2) %>% filter(str_detect(term, "time_since")) %>% 
  mutate(time = as.numeric(gsub("time_since_policy", "", term))) %>% 
  arrange(time) 
## # A tibble: 16 × 6
##    term                estimate std.error statistic  p.value  time
##    <chr>                  <dbl>     <dbl>     <dbl>    <dbl> <dbl>
##  1 time_since_policy1     0.621      1.13     0.547 5.87e- 1     1
##  2 time_since_policy2     2.06       1.13     1.81  7.73e- 2     2
##  3 time_since_policy3     4.11       1.16     3.56  9.55e- 4     3
##  4 time_since_policy4     5.64       1.16     4.88  1.63e- 5     4
##  5 time_since_policy5     7.17       1.16     6.21  2.19e- 7     5
##  6 time_since_policy6     8.70       1.16     7.53  2.96e- 9     6
##  7 time_since_policy7    10.2        1.16     8.85  4.59e-11     7
##  8 time_since_policy8    11.7        1.20     9.77  2.89e-12     8
##  9 time_since_policy9    13.3        1.20    11.1   6.07e-14     9
## 10 time_since_policy10   18.7        1.67    11.2   5.00e-14    10
## 11 time_since_policy11   20.8        1.67    12.5   1.48e-15    11
## 12 time_since_policy12   23.0        1.67    13.8   5.41e-17    12
## 13 time_since_policy13   25.2        1.67    15.1   2.41e-18    13
## 14 time_since_policy14   27.4        1.67    16.4   1.29e-19    14
## 15 time_since_policy15   29.5        1.68    17.6   1.04e-20    15
## 16 time_since_policy16   31.7        1.68    18.9   7.43e-22    16

The effect estimates on the policy indicators do not equal an average of the ATTs from the two states for each time period. For this example, this specification gives estimates that are systematically lower than those produced by the Callaway Sant’Anna model.

Takeway: When the treatment effect is staggered, dynamic, and hetergeneous, the TWFE regression specification will produced biased results, as will the specification using a categorical variable for time since treatment. In this case, it is best to use another approach like the Callaway Sant’Anna function.

0.9 Scenario 8 (24 states, staggered timing , dynamic and hetergenous treatment effects)

We wanted to make a more complex scenario that may be more aligned with typical cases, where states are the treated units, for example. We also added variability in the outcome variable measured over time, with different amounts of variability by state to reflect the case when some states are smaller than others.

all_states %<>% mutate(ever_trt = case_when(state > 25 ~ "treated", 
                                            state <= 25 ~ "untreated")
                       )

ggplot(all_states, aes(x = time, y = outcome)) + 
  geom_line(aes(col = factor(time_first_treat), group = state)) +
  geom_point(aes(fill = factor(time_first_treat), pch = ever_trt)) +
  labs(title = "Scenario 8 (Numerous states, staggered, dynamic, and heterogeneous)") +
  geom_vline(aes(xintercept = 4.5), lty = 2, col = treated5_green) + 
  geom_vline(aes(xintercept = 11.5), lty = 2, col = treated12_red) +
  scale_color_manual(values = c(control1_blue, treated5_green, treated12_red)) +
  scale_fill_manual(values = c(control1_blue, treated5_green, treated12_red)) +
  scale_shape_manual(values = c(23, 21)) + 
  theme_bw()

Visualization of all comparisons made by TWFE regression

We start by running the linear model as we’ve done previously:

s8_mod <- lm(outcome ~ policy + factor(state) + factor(time), 
             data = all_states)
tidy(s8_mod)
## # A tibble: 44 × 5
##    term            estimate std.error statistic  p.value
##    <chr>              <dbl>     <dbl>     <dbl>    <dbl>
##  1 (Intercept)        7.75       1.88     4.12  4.57e- 5
##  2 policy1            6.91       1.10     6.25  9.63e-10
##  3 factor(state)11    2.67       1.95     1.37  1.73e- 1
##  4 factor(state)12   -0.456      1.95    -0.234 8.15e- 1
##  5 factor(state)13    0.418      1.95     0.214 8.31e- 1
##  6 factor(state)14   -0.246      1.95    -0.126 9.00e- 1
##  7 factor(state)15   -1.27       1.95    -0.652 5.15e- 1
##  8 factor(state)20    0.836      1.95     0.428 6.69e- 1
##  9 factor(state)21   -4.02       1.95    -2.06  4.00e- 2
## 10 factor(state)22    0.590      1.95     0.302 7.63e- 1
## # … with 34 more rows

Up until now, we haven’t worried about the estimation of the 95% confidence interval. In particular, we need to acount for the clustering on data over time within state. To do that we use the vcovHC function from the sandwich package.

library(sandwich)
m2_var <- vcovHC(s8_mod)

We can then pull out the variance from the policy1 term from the variance-covariance matrix and use it to compute Wald type confidence intervals:

est = summary(s8_mod)$coefficients["policy1", "Estimate"] 
lb = summary(s8_mod)$coefficients["policy1", "Estimate"] - 1.96*sqrt(m2_var["policy1","policy1"]) 
ub = summary(s8_mod)$coefficients["policy1", "Estimate"] + 1.96*sqrt(m2_var["policy1","policy1"])

The estimated effect is 6.9 with a 95% CI from 4.6 to 9.3.

We can use the Bacon decomposition to examine the contrasts contributing to the TWFE estimate. I set up this scenario such that the effects should be the same as for scenario 7. In that scenario the estimates equalled:

  1. Treated at time 5 vs. never-treated: 18
  2. Treated at time 12 vs. never-treated: 5
  3. Treated at time 12 vs post-treatment time for states treated at time 5: -11
  4. Treated at time 5 vs pre-treatment time for states treated at time 12: 9

We can compare this to the output from the Goodman Bacon decomposition:

all_states %<>% mutate(state_n = as.numeric(as.character(state)),
               policy_n = as.numeric(as.character(policy)))

s8_bacon <- bacon(outcome ~ policy_n,
      data = all_states,
      id_var = "state_n",
      time_var = "time")
##                       type  weight   avg_est
## 1 Earlier vs Later Treated 0.06715   9.46683
## 2 Later vs Earlier Treated 0.15108 -10.59360
## 3     Treated vs Untreated 0.78177  10.06890
s8_bacon
##   treated untreated   estimate     weight                     type
## 2       5     99999  17.685455 0.30695444     Treated vs Untreated
## 3      12     99999   5.145060 0.47482014     Treated vs Untreated
## 6      12         5 -10.593604 0.15107914 Later vs Earlier Treated
## 8       5        12   9.466831 0.06714628 Earlier vs Later Treated

We can see that the estimates are not too far off. Comparing to the decomposition table from scenario 7, the weights have shifted. In particular here 24% of the weight is put on the later treated vs. never treated states, whereas this number was 47% in scenario 7. Worryingly, the weight on the forbidden contrast increased from 15% in scenario 7 to 31% in this scenario. This is because the weight is proportional to ….[TO FILL IN]

Compare to Callaway and Sant’Anna’s approach:

s8_cs <- att_gt(yname = "outcome", 
             tname = "time", 
             idname = "state_n", 
             gname = "time_first_treat", 
             data = all_states, 
             anticipation = 0)

s8_cs_ag <- aggte(s8_cs, type="simple")
summary(s8_cs_ag)
## 
## Call:
## aggte(MP = s8_cs, type = "simple")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Forthcoming at the Journal of Econometrics <https://arxiv.org/abs/1803.09015>, 2020. 
## 
## 
## Overall ATT:  
##      ATT Std. Error     [95%  Conf. Int.]  
##  13.9483     2.2834    9.4729     18.4237 *
## 
## 
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Never Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust

NEED TO FIGURE THIS OUT… Their weighted estimator is equal to \(18 \times (25*6) + 5 \times (25*6)\)

Looking at the effects by group now:

s8_cs_ag2 <- aggte(s8_cs, type = "group")
summary(s8_cs_ag2)
## 
## Call:
## aggte(MP = s8_cs, type = "group")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Forthcoming at the Journal of Econometrics <https://arxiv.org/abs/1803.09015>, 2020. 
## 
## 
## Overall ATT:  
##      ATT Std. Error     [95%  Conf. Int.]  
##  12.0011     1.3627    9.3302      14.672 *
## 
## 
## Group Effects:
##  group     ATT Std. Error [95% Simult.  Conf. Band]  
##      5 18.9553     1.4183       16.2383     21.6723 *
##     12  5.0470     2.8063       -0.3288     10.4228  
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Never Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust

(Was expecting 18 and 5 – possible we are off just due to random error - need to rerun with a different seed)

s8_cs_ag3 <- aggte(s8_cs, type="dynamic")
summary(s8_cs_ag3)
## 
## Call:
## aggte(MP = s8_cs, type = "dynamic")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Forthcoming at the Journal of Econometrics <https://arxiv.org/abs/1803.09015>, 2020. 
## 
## 
## Overall ATT:  
##     ATT Std. Error     [95%  Conf. Int.]  
##  16.951      1.384   14.2384     19.6636 *
## 
## 
## Dynamic Effects:
##  event time     ATT Std. Error [95% Simult.  Conf. Band]  
##         -10 -0.1829     2.3494       -6.4655      6.0997  
##          -9 -4.7005     2.2907      -10.8260      1.4250  
##          -8  6.5472     2.4453        0.0082     13.0862 *
##          -7 -3.8124     3.0567      -11.9863      4.3614  
##          -6 -0.1897     2.1720       -5.9978      5.6184  
##          -5 -1.7594     2.5558       -8.5937      5.0750  
##          -4  1.7046     1.7409       -2.9508      6.3601  
##          -3 -0.3681     2.6355       -7.4157      6.6794  
##          -2 -0.3533     1.8428       -5.2811      4.5744  
##          -1 -0.8363     2.2527       -6.8601      5.1876  
##           0  2.8462     1.8696       -2.1534      7.8457  
##           1  5.5198     1.5908        1.2659      9.7738 *
##           2  4.6567     1.7842       -0.1143      9.4277  
##           3  5.2618     2.3166       -0.9331     11.4567  
##           4 11.1385     2.4382        4.6184     17.6586 *
##           5 12.1587     3.0856        3.9076     20.4099 *
##           6  6.1909     2.9196       -1.6163     13.9982  
##           7 15.0719     3.2545        6.3691     23.7748 *
##           8 14.6472     2.7010        7.4245     21.8699 *
##           9 22.0564     2.5482       15.2424     28.8705 *
##          10 23.8719     1.3611       20.2320     27.5117 *
##          11 27.7298     1.8740       22.7186     32.7411 *
##          12 26.6291     3.2753       17.8707     35.3876 *
##          13 29.5526     2.1161       23.8941     35.2112 *
##          14 31.6321     2.4494       25.0822     38.1820 *
##          15 32.2518     3.4996       22.8935     41.6101 *
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Never Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust
ggdid(s8_cs_ag3)

Now that we have confidence intervals, we can assess the parallel trends assumption by investigating if the pre-treatment CIs overlap the null and are randomly distributed around the zero line. This is true in our example, so we wouldn’t be concerned about violating the parallel trends assumption with these data.

New addition Sun Abraham

table(all_states$time_first_treat, useNA = "always")
## 
##    0    5   12 <NA> 
##  240  120  120    0
all_states$time_first_treat2 <- all_states$time_first_treat
all_states$time_first_treat2[all_states$time_first_treat == 0] <- Inf
table(all_states$time_first_treat, all_states$time_first_treat2, useNA = "always")
##       
##          5  12 Inf <NA>
##   0      0   0 240    0
##   5    120   0   0    0
##   12     0 120   0    0
##   <NA>   0   0   0    0
s8_sa <- staggered_sa(df = all_states,
                      i = "state_n",
                      t = "time",
                      g = "time_first_treat2",
                      y = "outcome", 
                      estimand = "simple")

s8_sa$estimate #12.12 estimate of the simple weighted average
## [1] 13.94829
#95% CI
round(c(s8_sa$estimate, s8_sa$estimate - 1.96*s8_sa$se, s8_sa$estimate + 1.96*s8_sa$se), 2)
## [1] 13.95 12.22 15.68
round(c(s8_sa$estimate, s8_sa$estimate - 1.96*s8_sa$se_neyman, s8_sa$estimate + 1.96*s8_sa$se_neyman), 2)
## [1] 13.95 11.46 16.43

0.10 To Do

  • Explain how the weight is calculated in the Bacon decomp – when would it place a lot of weight on the forbidden contrasts? I have another R file called Bacon weights where I try and implement the weight formulas but I get stuck…
  • When will the weight be negative? I still haven’t figured out what scenario this corresponds to. Possibly in a setting where there are no ever-treated states?
  • Need to say something about standard error estimation - the code as written is not concerned with SE at all. So may want to say something or modify the code.
  • CONSIDER IMPLEMENTING THE STAGGERED REGRESSION APPRAOCH
  • DISCUSSION OF DIFFERENT TARGET PARAMETERS, AND WHY SOME MIGHT BE OF MORE INTEREST THAN OTHERS
  • https://causalinf.substack.com/p/illustrating-the-bias-of-twfe-in?utm_source=url maybe try implementing this example
# s8 <- read_xlsx(here("data", "scenarios.xlsx"), sheet = "scen8", 
#                 col_types = c("text", "text", "text", "numeric"))
# 
# s8 %<>% mutate(state_n = as.numeric(as.character(state)),
#                policy_n = as.numeric(as.character(policy)))

# s8_bacon <- bacon(formula = outcome ~ policy_n, 
#                   data = s8,
#                   id_var = "state_n",
#                   time_var = "time")

```